Adaptive Sampling with Optimal Cost for Class-Imbalance Learning

نویسنده

  • Yuxin Peng
چکیده

Learning from imbalanced data sets is one of the challenging problems in machine learning, which means the number of negative examples is far more than that of positive examples. The main problems of existing methods are: (1) The degree of re-sampling, a key factor greatly affecting performance, needs to be pre-fixed, which is difficult to make the optimal choice; (2) Many useful negative samples are discarded in under-sampling; (3) The effectiveness of algorithm-level methods are limited because they just use the original training data for single classifier. To address the above issues, a novel approach of adaptive sampling with optimal cost is proposed for class-imbalance learning in this paper. The novelty of the proposed approach mainly lies in: adaptively over-sampling the minority positive examples and under-sampling the majority negative examples, forming different sub-classifiers by different subsets of training data with the best cost ratio adaptively chosen, and combining these sub-classifiers according to their accuracy to create a strong classifier. It aims to make full use of the whole training data and improve the performance of classimbalance learning classifier. The solid experiments are conducted to compare the performance between the proposed approach and 12 state-of-the-art methods on challenging 16 UCI data sets on 3 evaluation metrics, and the results show the proposed approach can achieve superior performance in class-imbalance learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimized Cost-Free Learning Using ABC-SVM Approach in the Class Imbalance Problem

In this work, cost-free learning (CFL) formally defined in comparison with cost-sensitive learning (CSL). The primary difference between them is that even in the class imbalance problem, a CFL approach provides optimal classification results without requiring any cost information. In point of fact, several CFL approaches exist in the related studies like sampling and some criteriabased approach...

متن کامل

Measuring Accuracy between Ensemble Methods: AdaBoost.NC vs. SMOTE.ENN

The imbalanced class distribution is one of the main issue in data mining. This problem exists in multi class imbalance, when samples containing in one class are greater or lower than that of other classes. Most existing imbalance learning techniques are only designed and tested for two-class scenarios. The new negative correlation learning (NCL) algorithm for classification ensembles, called A...

متن کامل

Dealing with Multiple Classes in Online Class Imbalance Learning

Online class imbalance learning deals with data streams having very skewed class distributions in a timely fashion. Although a few methods have been proposed to handle such problems, most of them focus on two-class cases. Multi-class imbalance imposes additional challenges in learning. This paper studies the combined challenges posed by multiclass imbalance and online learning, and aims at a mo...

متن کامل

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...

متن کامل

A comparative study on rough set based class imbalance learning

This paper performs systematic comparative studies on rough set based class imbalance learning. We compare the strategies of weighting, re-sampling and filtering used in the rough set based methods for class imbalance learning. Weighting is better than re-sampling, and re-sampling is better than filtering. The weighted rough set based method achieves the best performance in class imbalance lear...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015